Subsampling-based HMC parameter estimation with application to large datasets classification
نویسندگان
چکیده
This paper presents a contextual algorithm for the approximation of Baum’s forward and backward probabilities, which are extensively used in the framework of Hidden Markov chain models for parameter estimation. The method differs from the original algorithm by taking into account only a neighborhood of limited length and not all the data in the chain for computations. It then becomes possible to propose a bootstrap subsampling strategy for the computation of forward and backward probabilities, which greatly reduces computation time and memory saving required for EM-based parameter estimation. Comparative experiments regarding the neighborhood size and the bootstrap sample size are conducted by mean of unsupervised classification error rates. Practical interest of such an algorithm is then illustrated through the segmentation of large-size images; classification results confirm the validity and the accuracy of the proposed algorithm while greatly reducing computation and memory requirements.
منابع مشابه
Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling.
We describe a method for estimating species trees that relies on replicated subsampling of large data matrices. One application of this method is phylogeographic research, which has long depended on large datasets that sample intensively from the geographic range of the focal species; these datasets allow systematicists to identify cryptic diversity and understand how contemporary and historica...
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملOn the overestimation of random forest’s out-of-bag error
Background The ensemble method random forests has become a popular classification tool in bioinformatics and related fields. The out-of-bag error is an error estimation technique which is often used to evaluate the accuracy of a random forest as well as for selecting appropriate values for tuning parameters, such as the number of candidate predictors that are randomly drawn for a split, referre...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملA Pre-Trained Ensemble Model for Breast Cancer Grade Detection Based on Small Datasets
Background and Purpose: Nowadays, breast cancer is reported as one of the most common cancers amongst women. Early detection of the cancer type is essential to aid in informing subsequent treatments. The newest proposed breast cancer detectors are based on deep learning. Most of these works focus on large-datasets and are not developed for small datasets. Although the large datasets might lead ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Signal, Image and Video Processing
دوره 8 شماره
صفحات -
تاریخ انتشار 2014